Build an IdempotencyInterceptor that reads the Idempotency-Key request header, checks a cache for a stored response, and returns it if found — short-circuiting the handler. On the first request, let the handler run and store the response in cache with tap(). Fire-and-forget on cache writes so the response is never delayed.
The Idempotency-Key header must be unique per logical operation — clients generate it (e.g. UUID v4).
Cache the full response object — replay must return the exact same response body and status.
Set Idempotency-Replayed: true header on replayed responses so clients can detect duplicates.
Only apply to mutating methods (POST, PUT, PATCH) — GET is already idempotent by definition.
Use a distributed cache (Redis) in production — in-memory cache breaks on multi-instance deployments.